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Global analyses of UPF1 binding and function reveal 
expanded scope of nonsense-mediated mRNA decay 

Jessica A. Hurt, Alex D. Robertson, and Christopher B. Burge 1 

Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA 

UPF1 is a DNA/RNA helicase with essential roles in nonsense-mediated mRNA decay [NMD) and embryonic de- 
velopment. How UPF1 regulates target abundance and the relationship between NMD and embryogenesis are not well 
understood. To explore how NMD shapes the embryonic transcriptome, we integrated genome-wide analyses of UPF1 
binding locations, NMD-reguIated gene expression, and translation in murine embryonic stem cells fmESCs). We iden- 
tified over 200 direct UPF1 binding targets using crosslinking/immunoprecipitation-sequencing (CLIP-seq) and revealed 
a repression pathway that involves 3' UTR binding by UPF1 and translation but is independent of canonical targeting 
features involving 3' UTR length and stop codon placement. Interestingly, NMD targeting of this set of mRNAs occurs in 
other mouse tissues and is conserved in human. We also show, using ribosome footprint profiling, that actively translated 
upstream open reading frames fuORFs) are enriched in transcription factor mRNAs and predict mRNA repression by 
NMD, while poorly translated mRNAs escape repression. Together, our results identify novel NMD determinants and 
targets and provide context for understanding the impact of UPF1 and NMD on the mESC transcriptome. 



[Supplemental material is available for this article.] 

The multistep nature of eukaryotic gene expression and RNA 
processing enables multiple layers of regulation but also introduces 
more opportunities for error. Nonsense-mediated mRNA decay 
(NMD) is a highly conserved RNA surveillance pathway that 
oversees mRNA translation and targets those mRNAs harboring 
premature termination codons (PTCs) for decay preventing the 
cell from producing potentially deleterious truncated proteins. As 
a translation-dependent process, NMD is triggered when a ribo- 
some stalls at the termination codon (TC) of a target RNA and re- 
cruits the RNA helicase UPF1 (for review, see Kervestin and Jacobson 
2012). UPF1 is conserved in all studied eukaryotes and strictly re- 
quired for NMD activity (Leeds et al. 1991; for review, see Conti and 
Izaurralde 2005). The NMD pathway has important implications in 
human disease, as —11% of disease-causing mutations result in the 
production of nonsense-containing mRNAs (Mort et al. 2008) and 
frequently result in haploinsufficiency phenotypes (for review, see 
Kuzmiak and Maquat 2006). 

Interestingly, while NMD is traditionally considered to be 
required to prevent the translation of aberrant mRNAs that harbor 
mutations or result from enors in transcription or splicing, this 
pathway is also implicated in regulating the expression of many 
normal ("wild-type") genes and mRNAs (for review, see Schweingruber 
et al. 2013). These include mRNAs harboring upstream open reading 
frames (uORFs), selenocysteine codons, long 3' UTRs, or alternative 
splicing events that generate isoforms with PTCs. While this last mode 
is used to regulate the levels of specific factors, particularly splicing 
factors (Lareau et al. 2007; Ni et al. 2007), in general, the regulation of 
and importance of this pathway's effects on wild-type gene expression 
remains poorly understood. A large fraction of the mammalian ge- 
nome appears to be regulated by NMD; two recent studies have 
estimated that between one sixth and one quarter of mamma- 
lian genes are affected by this pathway (Mcllwain et al. 2010; 
Weischenfeldt et al. 2012). Mice homozygous null for key NMD 
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factors die during embryogenesis (Medghalchi et al. 2001; 
Weischenfeldt et al. 2008; Mcllwain et al. 2010), suggesting that 
aberrant expression of NMD target mRNAs may contribute to these 
phenotypes. However, distinguishing primary from secondary ef- 
fects of inhibition of the NMD pathway remains challenging, and 
how NMD activity regulates the mammalian transcriptome early 
in development is not well understood. 

In mammals, targeting of UPF1 to mRNAs that harbor a PTC is 
primarily thought to occur via its specific interactions with addi- 
tional, strategically positioned NMD factors. Pre-mRNA splicing 
results in the deposition of a multiprotein complex, known as the 
exon junction complex (EJC), —20-24 nt upstream of the exon- 
exon junction. The EJC can recruit many different factors that 
affect mRNA metabolism, including the NMD factors UPF2 and 
UPF3 (Le Hir et al. 2001). When recruited to the EJC and suffi- 
ciently downstream from a TC, UPF2 and UPF3 can stabilize UPF1 
interactions at the terminating ribosome and stimulate both its 
ATPase and its helicase activity (Chamieh et al. 2008; Chakrabarti 
et al. 2011) as well as its phosphorylation by the kinase SMG1 
(Yamashita et al. 2001). These activities, in turn, trigger a cascade of 
events resulting in degradation of the target mRNA. Exon-exon 
junctions positioned >50 nt 3' from the TC (downstream exon- 
exon junctions or dEJs) trigger NMD of the host mRNA (Cheng and 
Maquat 1993), a distance likely reflecting the sizes of the termi- 
nating ribosome and EJC. Since EJCs are normally displaced by a 
transiting ribosome during the first or "pioneer" round of trans- 
lation (Lejeune et al. 2002), typical mammalian mRNAs lacking 
dEJs (Nagy and Maquat 1998; Giorgi et al. 2007) will be cleared of 
EJCs in this process and will, therefore, fail to recruit UPF1 and will 
escape from NMD. 

An additional feature of mRNAs that enhances NMD sus- 
ceptibility is extended 3' UTR length (Buhler et al. 2006). Factors 
that associate with poly (A) tails (mainly poly [A] binding protein, 
cytoplasmic 1, PABPC1) can compete with UPF1 for binding to the 
terminating ribosome (Behm-Ansmant et al. 2007; Singh et al. 
2008), and modulation of the PABPC1-TC distance alters message 
stability (Amrani et al. 2004; Eberle et al. 2008). Recent studies 
demonstrated that UPF1 can associate with 3' UTRs of some 
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mRNAs, including an endogenous, long 3 ' UTR previously shown 
to be sufficient for decay (Hogg and Goff 2010; Kurosaki and 
Maquat 2013). However, specificity of UPF1 for particular UTRs is 
not understood, and the transcriptome-wide binding profile of 
UPF1 remains largely unknown. Furthermore, the relative contri- 
butions of dEJs and 3 ' UTR length to NMD of endogenous mRNAs 
have not been assessed genome-wide. 

Despite progress in understanding NMD mechanisms, the 
canonical determinants of NMD — 3 ' UTR length and presence of 
a dEJ — do not fully explain the observed impact of NMD on the 
transcriptome. For example, many mRNAs that appear as NMD 
targets in genome-wide studies lack these canonical features, and 
many transcripts that harbor these traits are not repressed, sug- 
gesting that they possess features that enable full or partial escape 
from degradation. Genome-wide, presence of an upstream open 
reading frame (uORF) in a gene's 5' UTR has been associated with 
NMD (Mendell et al. 2004; Ramani et al. 2009; Yepiskoposyan et al. 
2011). However, detailed analysis of specific uORF-containing 
mRNAs has revealed that only a fraction is actually targeted by 
this pathway (Linz et al. 1997; Stockklausner et al. 2006; Zhao 
et al. 2010). Genes with longer than average 3' UTRs have been 
associated globally with decay (Mendell et al. 2004; Hansen et al. 
2009; Ramani et al. 2009; Yepiskoposyan et al. 2011), but only 
a few specific UTRs have been shown to confer this activity (Singh 
et al. 2008; Yepiskoposyan et al. 201 1). Similarly, direct binding of 
mRNAs by UPF1 has been associated with NMD for only a handful 
of metazoan messages (Hogg and Goff 2010; Hwang et al. 2010). 
Thus, large-scale identification of direct NMD targets remains 
challenging and the transcriptome-wide binding and activity of 
UPF1 insufficiently characterized. 

Here, we sought to define the role of UPF1 in gene expression 
of an early developmental system, murine embryonic stem cells 
(mESCs), by identifying UPF1 binding locations within the tran- 
scriptome and globally measuring the changes in mRNA abun- 
dance and translation following perturbations to the NMD path- 
way. We associate uORF translation with NMD susceptibility and 
identify a class of UPF1 -bound mRNAs that undergo repression by 
NMD in the absence of canonical NMD features. Interestingly, the 
set of messages bound by UPF1 in mESCs is repressed by NMD in 
other mouse cells/tissues, and NMD-dependent repression of this 
group of mRNAs is conserved in humans. Our results enabled us to 
describe additional features associated with NMD, to quantify the 
contributions of these and canonical NMD-triggering features to 
the decay of endogenous mRNAs, and to better understand the role 
of NMD in embryonic cells. 

Results 

Hundreds of mRNAs with dEJs and long 3' UTRs 
are derepressed by UPF1 depletion and translational 
inhibition in mESCs 

To identify NMD-regulated genes and isoforms in an early de- 
velopmental system, we performed RNA-seq analysis of mESCs 
(v6.5) depleted of UPF1 or treated with cycloheximide (CHX). 
CHX is a potent translation elongation inhibitor, and relatively 
short treatment of cells with this drug results in the stabilization of 
NMD-targeted mRNAs (Carter et al. 1995). We reasoned that use of 
multiple methods to inhibit NMD, including a translational in- 
hibitor, would increase our ability to identify authentic NMD target 
mRNAs and that RNA-seq analysis would enable isoform-specific 
as well as gene-level quantitation. Stable mESC lines were derived 



using two independent shRNA sequences targeting Upfl (denoted 
Upfl-1 and Upfl-2) or a control shRNA targeting GFP. In cells 
infected with Upfl -specific shRNAs, UPF1 protein and mRNA levels 
were reduced to 21%-37% and 14%-15%, respectively, of those in 
control cells (Supplemental Fig. S1A,B). POU5F1 (also known as 
OCT4) levels and alkaline phosphatase staining remained similar 
between UPF1- and control-depleted cells, supporting that ESC state 
is maintained in the knockdowns (Supplemental Fig. S1C,D). 
Translational inhibition using CHX was performed on wild-type 
mESCs for 2 h, a duration that caused a significant increase in 
abundance of known NMD target mRNAs without causing visible 
cytotoxicity. 

RNA-seq reads were mapped to the mouse genome and exon- 
exon junctions, and both gene- and isoform-specific abundances 
were calculated (Trapnell et al. 2010). Pairwise comparisons of gene 
and mRNA expression values for each experiment were made rel- 
ative to controls, i.e., v6.5 CHX to v6.5, Upfl-1 to GFP, and Upfl-2 
to GFP, following normalization (see Methods). Expression changes 
were more similar between the two RNAi experiments than between 
either of these and CHX treatment, as expected (Fig. 1A). The 
overlap between the sets of mRNA isoforms whose expression in- 
creased or decreased by more than 1.1 -fold in the three NMD in- 
hibition treatments was twice that expected by chance, and the 
extent of this overlap rose with increasing fold change, indicating 
consistency in the response across the treatments (Supplemental 
Fig. S1E). The extent of overlap above background was greater for 
mRNAs that increased in abundance after treatment than for those 
that decreased, consistent with NMD's function as a decay pathway 
(Supplemental Fig. S1E). 

We next sought to assess the relative importance in the mESC 
NMD pathway of canonical targeting features by analyzing mRNAs 
with one or more downstream exon-exon junctions (dEJs) or with 
varying 3' UTR lengths. We defined a dEJ as an exon-exon junction 
located >50 nt 3' from an annotated TC (Nagy and Maquat 1998), 
a classification that includes both mRNAs that harbor a PTC (e.g., 
as a result of alternative splicing) and mRNAs with introns in their 
3 ' UTRs. While not a universal rule (Sauliere et al. 2012; Singh et al. 
2012), these mRNAs are likely to have an EJC between the TC and 
dEJ, potentially stimulating UPF1 activity. As expected from pre- 
vious studies in other cell types and organisms, messages harbor- 
ing a dEJ increased significantly in abundance following Upfl 
knockdown relative to mRNAs without dEJs (Fig. IB). Since mRNAs 
whose expression changed similarly in the three NMD inhibitory 
treatments are likely enriched for authentic NMD targets, we devel- 
oped a consistency criterion (see Methods) and identified mRNAs 
and genes that consistently increased, consistently decreased, or did 
not change across the three experiments, yielding —3900 mRNAs 
and —4500 genes designated as consistent (Supplemental Table S2). 
Indeed, the consistent subset of dEJ-containing mRNAs showed 
stronger derepression (median fold change —1.19, P < 1 X 10~ 7 ) 
upon Upfl knockdown than the full set of dEJ-containing messages 
(-1. 12-fold, P < 1 X 10" 13 ) relative to non-dEJ mRNAs, supporting 
the enrichment for authentic NMD targets in this set (Fig. 1B,C; and 
not shown). Similar comparisons between two control clones 
expressing a GFP-targeting hairpin yielded much smaller fold 
changes of ±4% (NS, not shown). 

Increasing mRNA 3' UTR length was correlated with in- 
creasing derepression following NMD inhibition for each UTR 
length bin considered (Fig. ID). The difference in derepression 
between mRNAs with long 3' UTRs (>1500 nt) and mRNAs with 
the shortest 3' UTRs (50-350 nt) was somewhat greater (on average 
1.26-fold) for the Upfl knockdowns than for the CHX treatments 



Genome Research 1637 

www.genome.org 



Hurt et al. 



CHX 
6212 



4 

931 



Upf1-1 

I87 5977 
1791 

1960 



Upf1-2 
6223 



B 



0 
0.25 



P dEJ cons v. non-dEJ= 2.8 x 1 0" 1 1 /^Sp^ 


PdEJv. non-dEJ= 9.8x1 0" 


/ V 


// 
/ '/ 






dEJ cons n= 241 

non-dEJ n= 10193 

dEJ n= 693 



0.5 1 2 

change in mRNA abundance (Upf1-1/GFP) 





1.35 




1.3 


c 
o 




$ LU 


1.25 






n ex 


1.2 








1.15 


la 

O T3 




"O 


1.1 


O 
LL. 


1.05 







P long cons v. short cons = 7.1 x 10" 41 ^f^ 

# 




/ 3'UTR length (nt) 




50-350 n= 1021 

350-500 n= 341 

500-1 000 n= 843 

1000-1 500 n= 546 

1500-10000 n= 1146 



0.5 1 2 

change in mRNA abundance (Upf1-1/GFP) 




1.35 
1.3 

I 1 " 25 

CD I 

I o 1-2 

CD 

© ^» 1.15 

0> o 

■g § 1.1 

35 
o 

1.05 




4*« 



3'UTR length (nt) 



Non-dEJ 



-O 0- 
CO LL 1 4 

< ^ 
Z ^ 
^ 1.2 

= Q. 



3' UTR length: -./ D 
(nt) <b° 



dEJ 




£ .1 1.2 
c 3 



5? 



non-dEJ„, 




4-' 



Figure 1 . Consistent derepression of hundreds of mRNAs with and without canonical NMD features occurs following UPF1 depletion and translational 
inhibition. (A) Overlap of mRNAs that changed expression by more than 1 .1-fold in the same direction in each of three NMD inhibition experiments 
(shRNA Upfl-1, shRNA Upfl-2, and CHX treatment). (B) Cumulative distribution functions (CDFs) of changes in mRNA abundance following UPF1 
depletion (shRNA Upfl -1 ) for all dEJ mRNAs (dashed red line), consistently changing dEJ mRNAs (solid red line), or mRNAs without an annotated dEJ (black 
line). P-value determined by Wilcoxon rank sum test. (C) Ratios of median fold expression change following NMD inhibition of dEJ cons to non-dEJ isoforms. 
Error bar represents standard error of the two populations compared. P-values determined as in B. (D) As in B for isoforms behaving consistently with 
different annotated 3' UTR lengths (different green lines). Median expression changes and standard error are shown at right See also Supplemental Figure 
S1 F. (f) As in Cfor mRNAs with long (1 500-1 0k nt) versus short (50-350 nt) 3' UTRs. (F) Interaction between dEJ and 3' UTR length. Median fold change in 
expression following UPF1 depletion of consistent mRNAs with different 3' UTR lengths without (left) and with (middle) an annotated dEJ. Ratios of median 
expression change following UPF1 depletion between mRNAs with and without an annotated dEJ of a given 3' UTR length (right). Significance of 
differences between different 3' UTR length bins was determined by permutation test (n = 2000). This trend was also observed when comparing ex- 
pression changes between each dEJ isoform and non-dEJ isoforms with equivalent 3' UTR lengths (±10%) following NMD inhibition (Supplemental Fig. 
SI H). Results were similar for other NMD inhibitory treatments. All fold change values and ratios are plotted on a log 2 scale. P-values: (*) P< 0.05, (**) P< 
0.01 , (***) P < 0.001 , (****) P < 0.0001 . 



(1.15-fold, allP< 2 X 1(T 23 ) (Fig. IE). In all cases, the magnitude of 
the 3' UTR length effect seen in the experimental treatments 
exceeded that observed between controls (Methods). The de- 
repression associated with 3' UTR length was not dependent on the 
presence of a dEJ (Fig. IF), as mRNAs with constitutive long 3' UTRs 
lacking dEJs exhibited similar derepression upon NMD inhibition 
(Supplemental Fig. S1F). Similar results were observed using differ- 
ent minimum expression cutoffs (Supplemental Fig. S1G). 

In summary, dEJs and long 3 ' UTRs were associated with a 
similar extent of NMD activity in mESCs. While median expression 



changes were moderate, some mRNAs changed much more than 
this. For example, 24 consistent dEJ messages and 33 consistent 
long 3' UTR messages were derepressed more than twofold fol- 
lowing NMD inhibition (Supplemental Table S2). The observed 
fold changes almost certainly underestimate the magnitude of 
NMD's effects, since the ~65%-80% knockdown of Upfl achieved 
likely does not completely abolish NMD activity. Some messages 
that possessed either a dEJ or a long 3 ' UTR were not derepressed by 
NMD inhibition, either because they remain repressed by residual 
UPF1 activity or perhaps because they have additional features that 
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enable escape from NMD. We observed reduced repression of 
mRNAs harboring long 3' UTRs containing A-rich segments and 
a weaker effect for T-rich segments (not shown). Poly(A) stretches 
internal to a long 3' UTR and proximal to a TC might recruit 
PABPC1, thus shortening the apparent 3' UTR and potentially 
inhibiting NMD of the message. 

Repression associated with a dEJ is strongest in the context 
of a short 3' UTR 

While addition of a dEJ to an mRNA with a long 3 ' UTR can enhance 
NMD-associated repression (Singh et al. 2008), the relationship 
between 3' UTR length and presence of a dEJ as NMD determinants 
has not been assessed genome- wide in mammalian cells. To address 
this question, we compared the changes in abundance of dEJ and 
non-dEJ subsets of mRNAs in different 3' UTR length classes. In 
general, presence of a dEJ was associated with increased derepres- 
sion, inespective of 3' UTR length class, following NMD inhibition 
(Fig. IF; Supplemental Fig. S1H; data not shown). However, after 
correction for the derepression associated with UTR length, the 
relative increase in expression associated with presence of a dEJ was 
much higher (1.63-fold) for mRNAs with short 3' UTRs (50-350 nt) 
than for those with longer UTRs (1.14-fold for UTRs longer than 800 
nt) (Fig. IF). This finding suggests that NMD triggered by a down- 
stream EJC is most active for transcripts with short 3' UTRs and that 
transcripts with longer UTRs are less sensitive to the presence of a dEJ. 

Genes derepressed following NMD inhibition are enriched 
for transcription factors 

Analysis of the biological functions of derepressed genes revealed 
expected results as well as some surprises. Several known NMD- 
targeted isoforms increased in abundance upon NMD inhibition in 
mESCs, including isoforms of genes involved in pre-mRNA splicing 
and NMD itself (Supplemental Table S2). In addition, one of the 
largest and most strongly enriched categories among derepressed 
genes was transcriptional regulators, including many DNA binding 
transcription factors (GO:0045449, regulation of transcription P = 
1.5 X 10" 11 , Benjamini-corrected P = 4.4 X 10~ 9 ) (Supplemental 
Table S3). While some changes might be indirect (Dahlseid et al. 
2003), repression of transcription factors (TFs) by NMD has also 
been previously observed in mouse embryonic fibroblasts (MEFs) 
and HeLa cells (Mcllwain et al. 2010; Wang et al. 2011). 

Translated but not untranslated uORFs are associated 
with NMD 

mRNAs harboring upstream open reading frames (uORFs) may be 
susceptible to NMD. If an uORF is translated prior to initial trans- 
lation of the main ORF in a gene with typical intron distribution, 
downstream EJCs will be present when the ribosome terminates. 
Additionally, the typically large distance from the uORF to the 
poly-A tail could trigger NMD. Under this model, however, decay is 
triggered only if the uORF is translated. 

A previous integrative study reported that genes with uORFs 
tend to produce ~10%-40% less protein than those without 
uORFs, with less significant effects on mRNA levels (Calvo et al. 
2009). Furthermore, several cases of uORFs that seemingly escape 
NMD have been described (Stockklausner et al. 2006), leaving the 
question open as to the degree that uORFs globally affect mRNA 
stability. Only recently has the translational status of uORFs been 
assessed genome-wide (Ingolia et al. 2009, 2011). Here we sought 



to identify uORFs that are actively translated and to assess their 
contribution to NMD in the mESC transcriptome. 

Ribosome footprint profiling (Ingolia et al. 2009) was per- 
formed using UPF1 -depleted and control-depleted mESCs, and ri- 
bosome locations were mapped within mRNAs and to assess the 
translational status of uORFs. The density of footprint reads was 
used to distinguish actively translated uORFs ("tuORFs") from 
nontranslated uORFs ("ntuORFs") in each cell line (Supplemental 
Table S4). In our classification scheme, we only considered uORFs 
located completely upstream of the annotated translation start site 
in order to cleanly distinguish footprint reads belonging to the uORF 
from those of the main ORF. Overall, the density of footprint reads 
in uORFs was well correlated between experiments (Spearman p = 
0.86 between Upfl-1 and control cells) (Supplemental Fig. S2A). We 
defined a tuORF as a uORF that had ribosome footprint coverage at 
least fivefold greater than that of surrounding regions and defined 
an ntuORF as a uORF that had footprint coverage no greater than the 
coverage of sunounding regions (see Methods). These definitions 
are conservative, enabling determination of translation status when 
the evidence is fairly strong, but leaving some uORFs unclassified. 
Genes were then classified by the presence and translation status 
of their uORFs. Ribosome footprint data for a typical tuORF- 
containing gene, the transcription factor Dmtfl, and a ntuORF- 
containing gene, Armcl, are shown in Figure 2A. Using these 
definitions, we identified 392 and 464 tuORF genes in control and 
Upfl knockdown cells, respectively, with most (347) in common 
between the two sets. Conversely, we identified 237 and 204 
ntuORF genes in control and Upfl knockdown cells, respectively. 
Most of the ntuORFs identified in each cell line also had low uORF 
to background footprint density ratios in the other cell line (Sup- 
plemental Fig. S2B), and <1% of all uORFs that were confidently 
classified changed classification between cell lines. For downstream 
analyses, we used uORF classifications derived from Upfl knock- 
down cells, as we reasoned that this condition would enhance our 
opportunity to observe isoforms that are actively targeted by NMD. 

Notably, tuORF genes were modestly but significantly dere- 
pressed relative to ntuORF genes following UPF1 depletion (P < 
0.001 for both hairpins) (Fig. 2B,C). While the degree of tuORF- 
associated derepression was strongest for the subset of consistent 
genes, it was also significant for all tuORF-containing genes (Fig. 2B). 
Furthermore, tuORF genes were significantly derepressed upon 
NMD inhibition compared to uORF-containing genes overall (Fig. 
2B; Supplemental Fig. S2C). Similar trends with smaller magnitudes 
were observed following translational inhibition (Fig. 2C; Supple- 
mental Fig. S2C). Together, these results suggest that regulated uORF 
translation can often modulate mRNA stability via NMD. 

Interestingly, tuORF-containing genes were enriched for 
transcriptional regulators compared to all expressed genes (GO: 
0045449, regulation of transcription, P = 3.7 X 10~ 5 , Benjamini- 
conected P < 0.05). Furthermore, we observed that genes encoding 
transcriptional regulators were enriched 1.5 -fold for tuORFs com- 
pared to all expressed genes (P = 4.1 X 10~ 6 ), and this enrichment 
increased to twofold for consistently derepressed messages (Fig. 2D). 
Together, these findings suggest that NMD triggered by uORF trans- 
lation is an important mechanism of gene expression regulation in 
mESCs and particularly for modulators of transcription. 

Identification of hundreds of mRNAs bound by UPF1, mostly 
in 3' UTRs 

One challenge facing study of mammalian NMD and of UPF1, in 
particular, is the identification of direct regulatory targets. While 
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Figure 2. Translation of uORFs is associated with UPF1 -mediated repression. (A) mRNA-seq (blue) and ribosome footprint (orange) reads from UPF1 
depleted (shRNA Upfl -1 ) cells mapping to Dmtfl and Armd mRNAs. Dmtfl (top) contains a tuORF (outlined with dark gray lines), while Armd contains 
a single ntuORF (dashed lines). (£) CDFs of changes in mRNA abundance following UPF1 depletion (shRNA Upfl -1 ) for all genes with a tuORF (dashed 
yellow line), consistent genes with a tuORF (orange line), genes with a uORF (dashed gray line), and genes with an ntuORF (black line). P-values de- 
termined by Wilcoxon rank sum test. (C) Ratios of median fold expression change following NMD inhibition of consistent genes with a tuORF to genes with 
an ntuORF. Error bars represent standard error of the two populations compared. P-values determined as in B. (D) Fraction of genes harboring a tuORFforall 
expressed genes, all expressed transcriptional regulators (GO:0045449), and transcriptional regulators derepressed by more than 1 .1 - or 1 .2-fold in at least 
two out of three NMD inhibition experiments. P-value of enrichment determined by hypergeometric test. Numbers of genes in each category are indicated. 
Error bars indicate binomial standard deviations. Fold change values and ratios shown in B and C are plotted on log 2 scale. Asterisks as in Figure 1 . 



UPF1 -bound mRNAs have been associated with NMD in yeast 
genome-wide (Johansson et al. 2007), metazoan studies have 
mostly infened UPF1 targets using indirect evidence such as changes 
in gene expression following UPF1 depletion. Here, we identified 
binding targets of UPF1 in mESCs using CLIP-seq. Wild-type mESCs 
were UV-crosslinked, and the resulting RNA-UPF1 complexes were 
immunoprecipitated using antibodies against endogenous UPF1 
after limited RNase digestion. Since the RNase used can impact 
CLIP-seq results (Kishore et al. 2011), we prepared libraries using 
both RNase A (two libraries: Upfl.Al and Upfl.A2) and RNase I 
(one library: Upfl. I) to enhance the robustness of the analysis. Small 
RNA fragments that coprecipitated with UPF1 were isolated, am- 
plified, and sequenced. Anti-rabbit IgG precipitates harvested in 
parallel contained little or no crosslinked RNA, indicating low levels 



of intact background RNA-protein complexes remaining after strin- 
gent washing during the CLIP procedure (Supplemental Fig. S3 A). 
After mapping the resulting CLIP-seq reads to the mouse genome 
and transcriptome and subtracting background read density, we 
determined the fraction of reads mapping to different genie regions 
(Supplemental Fig. S3B). The density of CLIP reads per nucleotide 
was —10- to 30-fold higher in exons than introns in all samples, 
consistent with the expectation that UPF1 interacts predominantly 
with mature mRNAs in the cytoplasm (Supplemental Fig. S3C). 

Based on the standard model of NMD activity, we had initially 
hypothesized that the majority of binding events would reside in 
close proximity to PTCs and/or dEJs. Instead, we observed a pro- 
nounced bias for UPF1 binding to occur in mRNA 3' UTRs, which 
was consistent in all three CLIP libraries (Fig. 3 A). When combining 
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Figure 3. UPF1 interacts predominantly with 3' UTRs of mature mRNAs. (A) Distribution of CLIP-seq reads mapping to 5' UTRs, coding sequences 
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with standard deviation of 1 0 nt. (C,D) Correlation of UPF1 CLIP samples binding in 3' UTRs of genes with minimum FPKM (fragments per kilobase per million 
mapped reads) of 50. Correlations of MBNL1 CLIP data in mouse C2C1 2 cells and two mouse brain samples (Wang et al. 201 2) and of AG02 CLIP data in 
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the data for all genes in a metagene analysis, the density of UPF1 
binding increased rapidly to —10 times that seen in coding regions 
just downstream from the TC and remained high throughout the 
3 ' UTR (Fig. 3B). Preferential binding to the 3 ' UTRs of specific mRNAs 



was observed (controlling for gene expression) (Supplemental Fig. 
S3D), and these preferences were strongly conelated across replicate 
UPF1 CLIP samples, indicating the gene-specific nature of the UPF1 
binding signal (Fig. 3C). UPF1 also exhibited preferential binding to 
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specific locations within 3' UTRs (Fig. 3D). The positions of binding 
along 3' UTRs were correlated between replicates and clustered 
separately from CLIP-seq locations obtained for other RNA binding 
proteins — AG02 and MBNL1 — in two recent studies of mouse cells 
(Leung et al. 201 1; Wang et al. 2012), indicating the specificity of the 
interactions identified. Correlations involving sample Upfl.A2 were 
less strong than those between samples Upfl.I and Upfl.Al, likely 
reflecting the lower complexity of the Upf 1 .A2 library. Comparison 
of UPF1 and AG02 binding sites revealed some significant overlaps 
(Supplemental Fig. S3E). While overlap by itself does not imply 
a functional relationship, a previous study showed that AG02 can 
inhibit NMD (Choe et al. 2010). 

While analysis of UPF1 binding sites in 3' UTRs did not reveal 
a clear sequence motif, we did find that UPF 1 -bound regions are 
enriched for guanosine residues (P < 0.0001, \ test) (Fig. 3E). 
Given UPFl's function as an RNA helicase, we also analyzed RNA 
structural features. We observed that UPF1 binding sites had higher 
propensity to form secondary structures (more negative AG fo i din g) 
than surrounding areas (Supplemental Fig. 3SF), an effect that was 
significant overall but could be attributed to increased GC content 
(not shown). Thus, our data suggest that UPFl's residence within 
a 3' UTR is biased toward primary sequences rich in G nucleotides 
or toward structures produced by G-rich RNA. Furthermore, anal- 
ysis of the two CLIP libraries that had deeper coverage (Upfl.Al 
and Upfl.I) (Supplemental Table SI) revealed that the extent of 
UPF1 binding in the upstream half of 3' UTRs was correlated with 
the extent of binding to the downstream half of the same UTR 
(Spearman p = 0.3 to 0.4, P = 0.018 and 0.0013, respectively, in the 
two libraries). This observation might result from sliding (trans- 
location) of UPF 1 along the 3' UTRs of some mRNAs (Melero et al. 
2012). Together, the binding data paint a picture of a factor with 
a moderate degree of specificity for particular mRNAs and locations 
within their 3' UTRs. 

Translation displaces UPF1 from ORFs 

To ask whether the process of translation influences UPF1 binding 
locations, we performed CLIP-seq analysis of UPF1 after 2 h of CHX 
treatment. Under these conditions, UPF1 CLIP tags were enriched 
in mature mRNAs, as in control conditions (Supplemental Fig. S3C). 
However, CHX treatment also caused a dramatic redistribution of 
UPF1 binding within mRNAs, resulting in much higher levels of 
binding to coding regions (Fig. 3 A), with similar densities of binding 
upstream of and downstream from TCs overall (Fig. 3B). The re- 
distribution of UPF 1 binding locations following a 2-h inhibition of 
translation indicates that UPF1 binding to RNAs is fairly dynamic 
and suggests that translating ribosomes normally displace UPF1 
from ORFs, as likely occurs for other RNA binding factors (Grimson 
et al. 2007). 

We identified significantly UPF 1 -bound mRNAs in control 
conditions by comparing the number of UPF 1 -bound positions 
within mRNAs relative to what would be expected if binding were 
random (controlling for gene length and expression level). Given 
that UPF1 is an RNA helicase, likely interacting transiently with 
RNA, we adopted a method of identifying high confidence targets 
within specific gene regions (3' UTRs or coding regions) rather 
than specific positions (Methods). After filtering for significant 
binding in replicate CLIP samples, we identified just over 200 high 
confidence target mRNAs with significant UPF1 binding in their 
3' UTRs and 17 genes with significant binding to coding regions 
(Supplemental Table S5). As a control, reads sampled randomly 
from the RNA-seq data at comparable 3' UTR depths as the CLIP 



reads yielded very few significantly enriched genes (Supplemental 
Fig. S3G). Unbound mRNAs were defined as those displaying no 
UPF1 binding in any CLIP experiment. Analyzing genes encoding 
UPF 1 -bound mRNAs by Gene Ontology analysis did not yield 
significant biases, but we noted that some bound mRNAs encoded 
proteins involved in a cell cycle (thioredoxin interacting protein, 
TXNIP and cell division cycle 25A, CDC25A), ESC pluripotency 
(estrogen related receptor, beta, ESRRB) (Fig. 3F; Zhang et al. 
2008), and NMD (SMG6 and SMG7). Interestingly, several NMD 
factors, including SMG6 and SMG7, participate in auto-regulatory 
feedback circuits to regulate their own levels (Huang et al. 201 1). In 
the case of SMG7, at least, the 3' UTR appears to mediate this 
regulation (Yepiskoposyan et al. 2011). Our data raise the possi- 
bility that this regulation involves direct binding of UPF1 to this 
mRNA's 3 ' UTR. 

UPF1 binding in 3' UTRs is associated with repression 

We next assessed whether UPF1 binding is associated with UPF1 
activity by measuring the abundance of mRNAs bound by UPF1 
following UPF1 depletion and translational inhibition. Given that 
UPF1 binding occurred predominantly to the 3' UTRs of mRNAs, 
we chose to focus on messages bound in this region for further 
analysis. mRNAs with significant UPF1 binding in their 3' UTRs 
were derepressed compared to unbound mRNAs following all of 
the NMD-inhibitory treatments (between 1.16- and 1.20-fold, P < 
1 X 10~ 7 ), implicating UPF1 binding in regulation of their mRNA 
levels (Fig. 4A,B). No significant change was observed between two 
control lines (< 1.01 -fold; not shown). Notably, despite the modest 
numbers of genes identified, we also observed that genes bound by 
UPF1 in their CDS were, on average, derepressed following NMD 
inhibition (Supplemental Fig. S4A). 

UPF1 also plays a role in other cellular processes in addition to 
NMD, including Stauf en-mediated mRNA decay (SMD). SMD is 
a translation- and UPF 1 -dependent, but UPF2-independent, decay 
mechanism in which UPF1 is recruited to mRNA 3' UTRs via 
Stauf en binding (Kim et al. 2005). In order to further characterize 
the regulation of the UPF 1 -bound mRNAs in this study, we took 
advantage of two recently published mouse RNA-seq data sets in- 
vestigating the role of UPF2 and the UPF1 kinase SMG1 in gene 
expression regulation (Mcllwain et al. 2010; Weischenfeldt et al. 
2012). The extent to which UPF1 binds to similar targets in dif- 
ferent cell types has not been examined comprehensively. How- 
ever, we observed significant derepression of mRNAs bound by 
UPF1 in mESCs in data from Upf2 knockout liver and Smgl 
knockout MEFs (1.28- and 1.20-fold, respectively bothP< 1 X 10" 4 ), 
further supporting a connection between UPF1 3' UTR binding 
and NMD (Fig. 4B,C; Supplemental Fig. S4B). 

The 3' UTR binding that we observed does not appear to re- 
flect the canonical dEJ-based NMD pathway, as genes encoding 
3' UTR-bound mRNAs were not enriched for expression of dEJ- 
containing isoforms (Fig. 4D). However, we did observe that UPF1- 
bound 3' UTRs were, on average, 2145 nt in length, 70% longer 
than the average 3' UTR length of 1262 nt (difference of all versus 
bound 3' UTRs, P = 3.5 X 10" 30 ) (Fig. 4E). Given that extended 
3 ' UTR length is itself an NMD-triggering feature, we asked whether 
increased 3' UTR length could explain all of the derepression as- 
sociated with UPF 1 -bound mRNAs following NMD inhibition. We 
next analyzed derepression of genes bound by UPF1, controlling 
for 3' UTR length (Supplemental Fig. S4D) or for both 3' UTR 
length and expression level (Fig. 4F). Interestingly, UPF 1 -bound 
mRNAs were also derepressed relative to these control gene sets 
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following both Upfl knockdown and translational inhibition 
(Supplemental Fig. S4D; Fig. 4F). While long UTRs were more 
likely to exhibit binding, a small subset of mRNAs with 3' UTRs 
<800 nt were also bound, and this set of genes was also derepressed 
upon NMD inhibition, indicating that binding is associated with 



decay regardless of 3' UTR length (Supplemental Fig. S4E). These 
observations suggest that, while UPF1 binding may contribute 
to the association between 3' UTR length and NMD, UPF1 bind- 
ing contributes directly to mRNA decay independent of 3' UTR 
length. 
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Given the high G content of UPF1 -binding sites, we also asked 
whether UPF1 -bound 3 ' UTRs were enriched for G-rich regions. We 
defined G-rich regions as 50-bp segments with G content within 
the top 5% of all such segments in 3' UTRs. Indeed, we found that 
UPF1 -bound UTRs have, on average, nearly twice the number of 
G-rich regions per kb than do unbound UTRs with similar lengths 
and expression levels (Fig. 4G), supporting a link between UPF1 
association and G content. 

In order to determine whether UPF1 -dependent repression of 
mouse mRNAs bound via their 3' UTR is conserved, we assessed 
whether human homologs of genes encoding mRNAs bound by 
UPF1 in mESCs were similarly repressed. Interestingly we observed 
that human homologs of mouse UPF1 targets were significantly 
derepressed following UPF1 depletion in two human cell lines, 
HeLa and U20S cells (both P < 0.01) (Fig. 4B,H; Supplemental Fig. 
S4C; Wang et al. 2011; Cho et al. 2012). Together, these findings 
provide evidence that these genes are similarly repressed in other 
mouse tissues and cells and in humans. 

Genes with low translational efficiency escape NMD 

As NMD is a translation-dependent process, we asked whether the 
level of translational activity influenced susceptibility to NMD. For 
this purpose, we calculated the average ribosome density — also re- 
ferred to as "translational efficiency" (TE) (Ingolia et al. 2009) — of 
each message by dividing the ribosome footprint density of the ORF 
by the RNA-seq read density of this same region. We then analyzed 



the effects on mRNA stability of different NMD-associated features 
as a function of TE values. When comparing TE to gene expression 
changes, we calculated each measure using data from separate ex- 
periments to avoid an established source of false positives (Larsson 
et al. 2010) and used only consistently behaving mRNAs to enrich 
for NMD-related effects. 

Overall, TE was positively correlated with derepression fol- 
lowing Upfl knockdown, consistent with the known translation- 
dependence of NMD (Fig. 5 A). Similar results were observed for the 
other NMD inhibition treatments (not shown). Furthermore, tran- 
scripts harboring dEJs derived from genes with very low TE failed to 
exhibit significant derepression following UPF1 depletion (Fig. 5B). 
Similarly, UPF1 binding in the 3' UTR was not associated with sig- 
nificant derepression following UPF1 depletion for genes with very 
low TE (Fig. 5C). Low TE genes are likely to encode mRNAs that are 
translated infrequently, reducing their sensitivity to NMD. For ex- 
ample, such mRNAs might undergo signal-induced translation but 
otherwise be held in a nontranslating (but stable) state. Interestingly, 
little difference in 3' UTR length-dependent derepression was ob- 
served between genes grouped by TE in the UPF1 depletion experi- 
ments (Fig. 5D). Thus, 3' UTR length-dependent NMD may be less 
reliant on robust translation than regulation based on UPF1 binding 
or presence of a dEJ. While these patterns were somewhat more 
variable following CHX treatment (not shown), genes with higher 
TE values were more derepressed overall after all treatments. To- 
gether, these data provide systematic evidence for modulation of 
NMD activity by translational efficiency. 
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Discussion 

Features predictive of mRNA repression 

One goal of this study was to identify features that predict an 
mRNA's susceptibility to NMD. In mESCs, we observed that estab- 
lished NMD-triggering features — long 3' UTR, presence of a dEJ, and 
presence of a uORF — were predictive of derepression following 
NMD inhibition (Fig. 6A). We also observed that presence of a uORF 
with ribosome footprint coverage indicative of active translation 
was more predictive of regulation by NMD than mere occurrence 
of a uORF (Fig. 6A), emphasizing the potential for regulated uORF 
translation to impact message levels. Most notably detection of 
UPF1 binding in a gene's 3' UTR was as predictive of NMD regula- 
tion as was presence of a long 3 ' UTR or dEJ, whether analyzed at the 
gene level (Fig. 6A) or at the level of individual mRNA isoforms 
(Supplemental Fig. S5). Furthermore, messages with longer 3' UTRs 
were more likely to be bound, but binding was predictive of NMD 
regulation independent of 3' UTR length (Fig. 4F). In total, —30% of 
genes in the mESC transcriptome contain at least one of the four 
features characterized here (dEJ, long 3' UTR, translated uORF, or 
3' UTR binding by UPF1). Overall, these genes were 1.7-fold more 
likely to be repressed by NMD than are genes lacking these features 
(Fig. 6A), and just requiring presence of UPF1 binding was associated 
with a 2.5-fold increase in repression potential. Additional mRNA 
properties that modify the efficacy of these features in triggering 
NMD undoubtedly exist. 

We observed that presence of a dEJ was associated with in- 
creased mRNA repression regardless of 3' UTR length class (Fig. IF; 
Singh et al. 2008). The extent of repression associated with pres- 
ence of a dEJ was greatest for mRNAs with short UTRs (Fig. IF), 
suggesting that decay triggered by presence of a long 3 ' UTR might 
reduce the scope of repression achievable by addition of a dEJ. 

UPF1 binds extensively in the 3' UTRs of a cohort of mRNAs 

This study provides the first genome-wide identification of UPF1 
binding sites within mRNAs. Given models of UPF1 recruitment 
release factors and/or components of the EJC (Kurosaki and 
Maquat 2013), we initially hypothesized that the majority of CLIP 
sites would reside near PTCs and their downstream EJCs. While we 
did observe a modest number of UPF1 CLIP reads near PTCs of dEJ- 
containing mRNAs (Supplemental Fig. S3H), most such sites were 
not detected consistently between CLIP libraries and thus were not 
emphasized here. mRNAs with dEJs may be degraded more quickly 
than other classes of NMD targets, making detection by CLIP more 
difficult. Furthermore, the small sizes of most alternative exons 
make it challenging to detect isoform-specific binding by CLIP-seq. 

Most UPF1 binding locations were distributed along the 3' 
UTRs of hundreds of mouse mRNAs (Fig. 3). This finding extends 
recent reports that UPF1 can associate with several, mostly exog- 
enous 3' UTRs (Hogg and Goff 2010; Kurosaki and Maquat 2013). 
Importantly, our genome-wide approach enabled us to find that 
UPF1 binding sites are not randomly distributed but are concen- 
trated in the UTRs of specific mRNAs (Fig. 3C,D; Supplemental Fig. 
S3D). A prior study proposed a 3' UTR length-sensing function for 
UPF1 (Hogg and Goff 2010), and we show that UPFl-bound 3' 
UTRs tend to be longer than average (Fig. 4E). However, we also 
show that UPF1 binding is associated with NMD-dependent re- 
pression in excess of that predicted by 3' UTR length (Fig. 4F), that 
UPF1 has specificity for certain UTRs, and that binding events that 
occur within short 3' UTRs are associated with NMD. Thus, our 
data indicate that message-specific features beyond 3' UTR length 



and dEJ presence determine UPF1 binding and associated mRNA 
decay. 

UPFl's interactions with RNA are dynamic and regulated by 
both ATP binding and interaction with auxiliary factors. The CLIP 
assay likely captures multiple distinct states, including UPF1 that is 
stably loaded onto the RNA before interaction with UPF2 or ATP 
(Chakrabarti et al. 2011), helicase-active UPF1 (post-UPF2 in- 
teraction) (Melero et al. 2012), and UPF1 that is actively involved 
in disassembly of mRNPs as degradation progresses (Franks et al. 
2010). RNA helicases often transit through or rearrange mRNP 
complexes associated with a wide variety of RNA sequences or bind 
at specific mRNA locations without recognizable primary sequence 
motifs (Bohnsack et al. 2009; Sievers et al. 2012). UPF1 bound lo- 
cations lacked a detectable sequence motif but displayed increased 
G content (Fig. 3E). Interestingly, earlier studies using purified 
human UPF1 found that its ATPase activity is more than fourfold 
less active when in the presence of poly(rG) than for any other 
ribohomopolymer tested (Bhattacharya et al. 2000), suggesting 
that UPF1 may preferentially pause at G-rich regions. Together, 
our data indicate that UPF1 resides in particular mRNAs, often in 
G-rich regions, and localizes mostly to 3' UTRs except when 
translation is inhibited. 

The degradation of messages targeted by UPF1 binding to the 
3' UTR is not only UPF1 -dependent but also translation-dependent 
(Figs. 4B, 5C; Kurosaki and Maquat 2013). The strong association 
that we observed between UPF1 binding and derepression in Upf2 
knockout tissue (Fig. 4B,C) suggests that our findings are pre- 
dominantly relevant to NMD rather than SMD. How might addi- 
tional NMD factors, in particular UPF2, be recruited to typical 
UPFl-bound 3' UTRs devoid of exon-exon junctions? One possi- 
bility is that, in addition to the canonical mode (Fig. 6B, left), NMD 
can also operate in a 3' UTR targeting mode, wherein UPF1 asso- 
ciates with 3' UTRs of messages and interacts with soluble UPF2 
and/or UPF3 or with these factors bound to 3 ' UTRs in the absence 
of an EJC (Fig. 6B). A recent study reported that the degree of UPF1 
association with mature mRNA was not affected by depletion of 
EIF4A3, the primary RNA-binding component of the EJC, sug- 
gesting that UPFl's association with mRNA may not require fully 
assembled EJCs (Singh et al. 2012). Interestingly, the proposed 
3' UTR targeting mode of regulation could occur after the initial 
rounds of translation of the message, raising the possibility that 
NMD could be used to deplete bulk pools of post-pioneer messages. 

Our CLIP-seq analysis of UPF1 binding appears to be far from 
saturating (Supplemental Fig. S3G), indicating that UPF1 may bind 
several hundred or even thousands of mRNAs, potentially ac- 
counting for additional NMD-repressed mRNAs (Supplemental Ta- 
ble S3). Notably, we observed that mRNAs bound in ESCs appear to 
be similarly regulated in MEFs and in mouse liver (Fig. 4B,C; Sup- 
plemental Fig. S4B), suggesting that binding to the same or similar 
sets of mRNAs occurs in other cellular contexts. Intriguingly, human 
homologs of genes bound by UPF1 in mouse were similarly regu- 
lated by UPF1 in human cells (Fig. 4B,H; Supplemental Fig. S4C), 
indicating conserved UPF1 -dependent repression of this set of 
genes. Together these findings posit that association of UPF1 with 
the 3' UTRs of many mESC mRNAs plays a widespread role in gene 
expression and that this mode of regulation is likely conserved in 
many cell types and organisms. 

NMD regulation via translation of uORFs 

We also identified translated uORFs genome-wide and observed an 
association with UPF1 -dependent mRNA repression. Translation of 



Genome Research 1 645 

www.genome.org 



Hurt et al. 




B 



Canonical EJC 




j 




Decay 



Figure 6. NMD features and models of UPF1 -dependent mRNA repression. (A) Predictive capacity of mRNA features for NMD-regulation. Fractions of 
expressed genes harboring a dEJ (red), long 3' UTR (green), UPF1 3' UTR binding (blue), uORF (light orange), and/or tuORF (orange) that were dere- 
pressed consistently are shown. Shown for comparison are the fractions of genes derepressed consistently regardless of feature content (medium gray), 
without any NMD-inducing feature (light gray), and with at least one feature (dark gray). P-values above each feature indicate significance relative to all 
expressed genes; brackets indicate significant comparisons between features (hypergeometric test). See also Supplemental Figure S5A. Asterisks as in 
Figure 1 . (B) (Left) Canonical dEJ-mediated regulation. EJC components (blue and gray) are deposited as a consequence of splicing in the nucleus —24 nt 
upstream of an exon-exon junction (black bar). Members of the EJC, including UPF2and UPF3, help to stabilize the transient UPF1 -ribosome interaction as 
well as to stimulate UPFI's phosphorylation and helicase activity, ultimately leading to decay of the message. (Right) 3' UTR UPF1 binding-mediated 
regulation. UPF1 binds to mRNA 3' UTRs independent of the presence of an exon-exon junction. At some frequency, UPF1 is activated by interaction with 
cytoplasmic EJC components. These factors may either be recently released from mRNAs due to translation or perhaps stably associated with message 
3' UTRs independent of an exon-exon junction. 
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a uORF prior to initial translation of the main ORF could trigger EJC- 
dependent NMD. Translation of a uORF after main ORF translation 
could also contribute to NMD by inhibiting translation of the main 
ORF and thereby increasing the accessibility of the coding region to 
UPF1 (as seen following CHX treatment) and/or extending the ef- 
fective length of the 3' UTR. Regulation of main ORF translation via 
regulated uORF translation controls expression of key transcription 
factors involved in cellular stress responses (Gaba et al. 2001; Vattem 
and Wek 2004). Our data suggest that regulated uORF translation 
could commonly trigger NMD to reinforce repression at the trans- 
lational level (Calvo et al. 2009). As a group, transcriptional regu- 
lators are derepressed in response to NMD inhibition, and this group 
is also enriched for tuORFs (Supplemental Table S3; Fig. 2D), sug- 
gesting that tuORF-dependent NMD is a common regulator of the 
expression levels of TFs. 

Role of NMD in the mESC transcriptional program 

Efficient depletion or elimination of several NMD components 
results in multisystemic developmental abnormalities and even- 
tual embryonic lethality in worms, zebrafish, flies, and mice, 
which may be attributable to NMD or non-NMD functions (for 
review, see Hwang and Maquat 2011; Varsally and Brogna 2012). 
Notably, examination of available ChIP data revealed that 116 
genes that were derepressed following NMD-inhibition have been 
previously identified as targets of the POU5F1 (also known as OCT4) 
TF, a master regulator of pluripotency (P < 0.005 by hypergeometric 
test) (Supplemental Table S3; Kim et al. 2008). Several mRNAs 
encoding developmentally relevant TFs — some of which contain 
tuORFs — were derepressed following NMD inhibition, including 
Klf9 (Martin et al. 2001), Ncorl (Jepsen et al. 2000), and Tbx3 
(Ivanova et al. 2006; Lu et al. 2011) (Supplemental Table S3). Mis- 
regulation of these or other TFs in NMD-compromised mice might 
contribute to developmental irregularities. 

Our findings, including the recognition of UPF1 binding to 
3 ' UTRs as a widespread NMD targeting determinant, the identi- 
fication of hundreds of direct NMD targets, and delineation of the 
relationships between mRNA translation and NMD susceptibility, 
provide a context for understanding the role of UPF1 and NMD in 
development and transcriptome control. 

Methods 

Cell culture and stable knockdown of Upfl 

Mouse v6.5 (129SvJae X C57BL/6) ESCs were cultured on irradi- 
ated DR4 mouse embryonic fibroblasts (MEFs) (Applied Stem Cell) 
in KO DMEM (Gibco), Pen Strep, L-Glutamine, nonessential 
amino acids, LIF, and either 10% FBS (HyClone) (for wild type) or 
15% FBS (for knockdown lines) in gelatinized culture dishes. Pu- 
romycin was added to media (1.5 |xg/mL) during selection as well 
as during routine culture of knockdown cells. Puromycin was re- 
moved from media for 48 h prior to performing any analysis of 
knockdown lines. For translational inhibition, 100 |xg/mL cyclo- 
heximide was added to culture media 2 h prior to harvest. For 
stable knockdown of Upfl, mESCs (20,000 cells) were plated off of 
MEFs for 24 h prior to infection with 40 julI of ~ 1 .3 7 X 10 8 titer virus 
particles (RNAi Consortium shRNA Library). shRNA sequences are 
listed in Supplemental Methods. After 24 h, media was changed on 
all infections, and after 48 h, cells were replated with MEFs using 
media containing puromycin. Clonal populations were isolated and 
tested for Upfl KD as well as expression of pluripotency factor 
PouSfl by RT-PCR. Clones with minimal Pou5fl expression variation 



from wild-type cells but significant change in Upfl expression were 
chosen for further analysis; Upfl-1 KD (4.4, 4.7), Upfl -2 KD (5.2, 
5.7), GFP-1,2 KD (2.4, 2.6). 

RNA isolation and library preparation 

mESCs were trypsinized and preplated on gelatinized dishes 
for 30 min to remove MEFs prior to harvest in TRIzol reagent 
(Invitrogen). Total RNA was further purified following isopropanol 
precipitation, using RNeasy columns and on-column DNase di- 
gestion (Qiagen). Twice Poly-T-selected RNA was isolated from 
10 jxg of total RNA and used as starting material in paired-end, strand- 
specific dUTP (Parkhomchuk et al. 2009) library prep using the 
SPRIworks Fragment library system (Beckman Coulter). Final li- 
braries were amplified using 14 PCR cycles, size selected by agarose 
gel for 290-bp fragments, and sequenced using either 2 X 80-nt (for 
knockdown cells) or 2 X 40-nt (for v6.5 and CHX v6.5 cells) reads on 
an Illumina HiSeq 2000. To maximize power of detection of lowly 
expressed isoforms, sequencing data from clones of same hairpins 
were merged prior to calculation of gene expression values. 

UPF1 CLIP-seq 

mESCs were plated off of MEFs for 24 h prior to 254 nm UV irra- 
diation (400 mj/cm 2 ) in 15 -cm plates, trypsinized, washed, and 
snap-frozen. CLIP-seq was performed similar to Wang et al. (2009, 
2012) , using antibodies against endogenous UPFl. Details of CLIP 
procedure are described in Supplemental Methods. 

Ribosome footprinting 

Footprinting was performed essentially as described in Ingolia et al. 
(2011), and RNA subtraction as described in Wang et al. (2012), 
using snap-freezing and no cycloheximide treatment. Additional 
modifications are described in Supplemental Methods. 

Computational analysis 

Computational analyses were performed using custom scripts in 
Python, Perl, MATLAB, or Matplotlib. 

Gene expression analysis 

For expression quantification, a custom gene annotation database 
was used consisting of combined 201 1 Ensembl (Flicek et al. 201 1) 
and UCSC (Fujita et al. 2011) annotations, with duplicate tran- 
scripts removed, and a handful of well-documented PTC+ isoforms 
that were not in these database releases (Bradley et al. 2012). The 
RNA-seq and ribosome footprinting reads were mapped to the 
mouse genome (mm9) and a list of junctions enumerated from our 
annotation database using TopHat v. 1.4.0 (Trapnell et al. 2009), and 
gene and isoform expression levels were quantitated using Cufflinks 
v. 1.3.0 (Trapnell et al. 2010) as described in Supplemental Methods. 

Overlap analyses and consistency filter 

For overlap analyses, genes or isoforms changing in expression 
above or below a given threshold in all three NMD-inhibiting ex- 
periments (CHX, Upfl-1, and Upfl-2) were identified (Fig. 1A; 
Supplemental Fig. S1E). For all other expression analyses, we de- 
veloped a consistency criterion. For an isoform or gene to pass this 
criterion, it must have either (1) increased consistently (more than 
1.1 -fold increase in two or more experiments, and not decreased 
more than 1.1-fold in the third), (2) decreased consistently (more 
than 1.1-fold decrease in two or more experiments, and not increased 
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more than 1.1 -fold in the third), or (3) remained consistently un- 
changed (not changed more than 1.1 -fold in either direction in all 
experiments) (Supplemental Table S2). Isoform and gene groups fil- 
tered to pass this filter are designated "cons" in all figures. 

Analysis of UPF1 binding 

Unique UPF1 CLIP sequences were first trimmed for adapter se- 
quence and 5' randomized barcode (2 nt) to yield fragments 22- 
38 nt in length and then mapped uniquely to the mouse genome 
and splice junction database allowing 2-nt mismatches (-m 1-best 
-strata) using Bowtie (v.0.12.6) (Langmead et al. 2009). Sequencing 
reads from the IgG libraries were similarly processed, after which 
read counts at a given position were amplified by an order of 
magnitude and "subtracted" from the respective UPF1 library by 
iteratively canceling out any reads that overlapped. Regional dis- 
tribution (exons, introns, intergenic/other) of unique UPF1 CLIP 
(IgG-subtracted) sequences and unique RNA-seq reads were cal- 
culated by determining if a read mapped within any known coding 
region, any UTR, and finally within any intron. Remaining reads 
were assigned to the intergenic/other category. Lengths of all these 
regions were calculated based on the best isoform for each coding 
gene. A P- value for binding to a given region of a gene or isoform 
was calculated for each CLIP library using a Poisson distribution as 
described in the Supplemental Methods. 

UPF1 binding correlation 

Correlation coefficients between binding of UPF1, MBNL1 (Wang 
et al. 2012), and AG02 (Leung et al. 2011) were calculated as the 
correlation of CLIP densities in each 3' UTR (Fig. 3C) or in 100-nt 
windows within each 3' UTR (Fig. 3D) for highly expressed genes 
(FPKM > 50) similar to Wang et al. (2012). 

Comparison with previously published data 

Upf2 KO and control mouse liver data sets were downloaded from 
Gene Expression Omnibus (GSE26561) (Weischenfeldt et al. 
2012), and Smgl KO and control MEF data were obtained from the 
lab of Benjamin Blencowe (University of Toronto) (Mcllwain et al. 
2010). All RNA-seq data were processed as described for data gen- 
erated in this study. Homologs of UPF1 bound and unbound genes 
were determined using the BioMart tool (Vilella et al. 2009). To 
assess gene expression changes for homologs of bound genes, 
microarray data were downloaded from Gene Expression Omni- 
bus, and the following parameters were used: For GSE30499, a 
minimum expression threshold of 64 was required of the RMA 
processed data for inclusion in analyses (Wang et al. 2011); for 
GSE26781, a minimum expression threshold of 8 of the log trans- 
formed, quantile normalized data was required for inclusion in 
analyses (Cho et al. 2012). For cases in which a gene was represented 
by multiple probe sets, the mean value of all the corresponding 
probe sets was used. 

Genome-wide survey of NMD 

Gene annotations and expression analysis in v6.5 mESCs were 
used to calculate the number of expressed genes (FPKM > 1) har- 
boring different NMD-inducing features. dEJ genes were defined as 
those that harbor at least one expressed annotated dEJ isoform 
(expression level > 1 FPKM and accounting for at least 10% of that 
of the entire gene). Long 3 ' UTR genes were defined as those whose 
primary annotation harbors a 3 ' UTR > 2000 nt. uORF genes were 
defined as those with at least one annotated uORF. tuORF genes 
and CLIP genes were defined as described in tuORF methods and 



UPF1 3' UTR binding methods. Predictive capacity of NMD fea- 
tures was determined by calculating the fraction of genes harbor- 
ing a given feature that were up-regulated by a certain threshold in 
at least two of three NMD inhibition experiments (without sig- 
nificant down-regulation in the third) compared to all expressed 
genes in the transcriptome that harbored this feature (significance 
calculated by two-tailed Fisher's exact test). In order to avoid 
double-counting of genes, features were called for genes using the 
following hierarchy: (1) UPF1 3' UTR binding; (2) presence of dEJ; 
(3) presence of a long 3' UTR; and (4) presence of a tuORF. 

uORF classification 

For classification of uORF translation status, ribosome footprinting 
reads were reduced to the codon occupied by the A site of the ri- 
bosome calibrated based on the pile up at known stop codons, 
similar to Ingolia et al. (2011). For each uORF, densities of mapped 
A sites were calculated within the uORF and the background un- 
translated region consisting of the 10 codons upstream of and 
downstream from the uORF. uORFs were required to be covered by 
at least one RNA-seq read to ensure they were spliced into the 
message. uORFs were called as translated when there was at least 
a fivefold greater density above the higher of either the footprint 
density in the flanking 60 nt or a minimum threshold coverage of 
two-thirds. If an uORF was not called as translated, it could be 
called as confidently untranslated (an ntuORF) if the footprint 
density within the uORF was less than the higher of either the 
footprint density in the flanking 60 nt or one read per 60 nt. Genes 
were classified as tuORF-containing if they harbored a transcript 
with one or more tuORFs or as ntuORF-containing if they harbored 
one or more ntuORFs and no tuORFs as called in data from clone 
4.7 (shRNA Upfl-1). 

Additional procedural details are described in Supplemental 
Methods. 

Data access 

High throughput sequencing data generated in this study has been 
submitted to the NCBI Gene Expression Omnibus (GEO; http:// 
www.ncbi.nlm.nih.gov/geo/) under accession number GSE41785. 
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